Branch prediction and simultaneous multithreading

نویسندگان

  • Sébastien Hily
  • André Seznec
چکیده

In this paper, we examined the behavior of three of the best performing branch prediction strategies while executing several threads of instructions simultaneously. We studied the impact of the addition of one Return Address Stack per hardware context. We showed that a 12-deep stack per thread is suucient to enhance greatly the accuracy of branch prediction while adding a minimal implementation cost. We explored the behavior of the branch predictors when independant applications are running simultaneously and when the workload is constituted by a parallel program. Our simulations showed that in multiprogramming environment, if the sizes of the tables (PHT/BTB) are proportionnal to the number of active threads, there are very few interactions, be they destructive or constructive. With parallel workloads, we could have expected a beneecial sharing eeect. In fact, it is very dependant of the branch predictors and in the best case, the gains stay very limited. Finally we showed that, for the three predictors, whether in multiprogramming or in parallel processing, if the sizes of the tables are kept small, there is a slight increase of the mispredictions, which is mostly due to an increase of the connicts in the BTB. Les travaux de S ebastien Hily sont en partie nanc es par la r egion Bretagne Pr ediction de branchement et multiot simultan e R esum e : Dans cette etude, nous examinons le comportement de trois strat egies de pr e-diction de branchement, parmi les plus performantes, lorsque plusieurs ots d'instructions sont ex ecut es simultan ement. Nous avons etudi e l'int er^ et de disposer d'une pile d'adresses de retour par contexte. Nous avons ainsi pu montrer qu'une pile de 12 entr ees par ot est suusante pour am eliorer de faa con signiicative la validit e des pr edictions de branchement tout en n'engendrant qu'un faible surco^ ut mat eriel. Nous avons explor e le comportement des m ecanismes de pr ediction quand des applications ind ependantes s'ex ecutent simultan ement et quand les applications sont issues d'un m^ eme programme parall ele. Nos simulations ont montr e que dans un environnement multipro-gramm e, si les tailles des tables (PHT/BTB) sont proportionnelles au nombre de ots actifs, il y a tr es peu d'interactions, aussi bien constructives que destructives. Pour un programme parall ele, nous pouvions attendre un eeet de partage b en eeque. En fait, cela d epend du type de pr …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Branch Predictors on an SMT Processor

Simultaneous multithreading (SMT) provides significant increases in microprocessor throughput by issuing instructions from multiple threads per clock cycle. SMT can be realized in a wide-issue superscalar with a modest increase in resources, because much of the hardware is shared among the multiple thread contexts. Branch prediction accuracy, a key component of microprocessor performance, can s...

متن کامل

A latency-conscious SMT branch prediction architecture

Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions cau...

متن کامل

Improving Conditional Branch Prediction on Speculative Multithreading Architectures

Dynamic conditional branch prediction is an indispensable technique for increasing performance in modern processors. However, currently proposed schemes suffer from loss of accuracy when applied to speculative multithreading CMP architectures. In this paper, we quantitatively investigate this problem and present a hardware scheme to improve the prediction accuracy. Evaluation results show that ...

متن کامل

Tolerating Branch Predictor Latency on SMT

Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with...

متن کامل

Simultaneous Speculation Scheduling - A Technique for Speculative Dual Path Execution

Commodity microprocessors uniformly apply branch prediction and single path speculative execution to all kinds of program branches and suuer from the high misprediction penalty which is caused by branches with low prediction accuracy and, in particular, by branches that are unpredictable. The Simultaneous Speculation Scheduling (S 3) technique removes such penalties by a combination of compiler...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996